In [3]:
from IPython.core.display import HTML
import pandas as pd
import numpy as np
HTML('''
<style>
.videoWrapper {
position: relative;
padding-bottom: 56.25%; /* 16:9 */
padding-top: 25px;
height: 0;
}
.videoWrapper iframe {
position: absolute;
top: 0;
left: 0;
width: 100%;
height: 100%;
}
p {
word-break: break-all;
white-space: normal;
}
div.prompt {display:none}
div.cell { /* Tunes the space between cells */
margin-top:1em;
margin-bottom:1em;
}
div.text_cell_render h1 { /* Main titles bigger, centered */
font-size: 2.2em;
line-height:1.4em;
text-align:center;
}
div.text_cell_render h2 { /* Parts names nearer from text */
margin-bottom: -0.4em;
}
p.fontTitle { /* Customize text cells */
font-family: 'Times New Roman';
font-size:2.5em;
line-height:1em;
padding-left:0em;
padding-right:0.5em;
}
p.fontReg { /* Customize text cells */
font-family: 'Times New Roman';
font-size:1.25em;
line-height:1em;
padding-left:3em;
padding-right:1em;
}
p.Under { /* Customize text cells */
font-family: 'Times New Roman';
font-size:0.75em;
line-height:2em;
padding-left:5em;
padding-right:3em;
}
</style>
<script>
code_show=true;
function code_toggle() {
if (code_show){
$('div.input').hide();
} else {
$('div.input').show();
}
code_show = !code_show
}
$( document ).ready(code_toggle);
</script>
<form action="javascript:code_toggle()"><input type="submit" value="Show/hide code"></form>
<p class='fontTitle'>Neural Networks with Tensorflow and other algorithms.\n</p>
<br>
<p class='fontReg'>Basis: This project deals with Decision trees (also known as forests), Linear regression, Neural Networks, and comparisions of the three, along with optimization for the problem set at hand.</p>
<p class='fontReg'>Preprocessing: All the data is preprocessed through converting all terms to numbers, so as to make it possible for a mathematical formula can be run on it, with fig1 showing the inputs and outputs:</p>
<p class='fontReg'> Unprocessed data:</p>
''')
Out[3]:
In [ ]:
In [57]:
data = pd.read_csv("pima.csv",index_col=0,delimiter=',')
data.head(10)
Out[57]:
In [47]:
HTML('''<br>
<p class='fontReg'> Processed data:</p>''')
Out[47]:
In [58]:
data2= pd.read_csv("Processed_DATA.csv",index_col=0,delimiter=',')
data2.head(10)
Out[58]:
Neural Networks with Keras as a Tensorflow overlay:
Per the https://www.keras.io website's information on Keras "Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research."
Per the https://www.tensorflow.org website's information on Tensorflow "TensorFlow is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device with a single API. TensorFlow was originally developed by researchers and engineers working on the Google Brain Team within Google's Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research, but the system is general enough to be applicable in a wide variety of other domains as well."
Linear regression:
Arbitrarily draws a line along the data with 1 scalar variable (Y), and multiple other variables (X), with Wikipedia stating:
"Given a data set of n statistical units, a linear regression model assumes that the relationship between the dependent variable yi and the p-vector of regressors xi is linear. This relationship is modeled through a disturbance term or error variable εi — an unobserved random variable that adds noise to the linear relationship between the dependent variable and regressors. Thus the model takes the form," this is a good video on the topic:
Decision trees:
Makes decisions in a "tree" as per wikipedia "A decision tree is a flowchart-like structure in which each internal node represents a "test" on an attribute (e.g. whether a coin flip comes up heads or tails), each branch represents the outcome of the test, and each leaf node represents a class label (decision taken after computing all attributes). The paths from root to leaf represent classification rules," with this video covering the basics of the decision tree:
Results:
So why use the different methods?
Because generally neural networks can be more accurate once tuned and tweaked, but can generally take longer to get certain things right, unlike linear regression, or a decision tree. With linear regression taking around 1/100th of the time that my neural network takes to complete, with the linear regression gaining around 10% accuracy over the neural network. The same applies to the decision tree, however the decision tree is more noisy than the linear regression at all points in the testing, but can get around 1%-2% higher accuracy over the linear regression also. - this is related to the "No Free lunch" Theorum, which in a nutshell states that no one algorithm works best for every problem.
Conclusion:
As shown in the graph above, the linear regression appears to be working the best due to its consistent, high (82% accuracy), rather than the much more noisy decision tree, and the neural network with tensorflow, which only was in the 70% area.
https://www.tensorflow.org
https://www.keras.io
https://en.wikipedia.org/wiki/No_free_lunch_theorem
https://en.wikipedia.org/wiki/No_free_lunch_theorem
https://en.wikipedia.org/wiki/Linear_regression
https://en.wikipedia.org/wiki/Decision_tree
In [ ]: